近年来,隐性神经表示(INR)已经出现并显示了其比离散表示的好处。但是,将INR拟合到给定的观察结果通常需要从头开始优化,这是效率低下的,并且在稀疏观测值中概括不佳。为了解决这个问题,大多数先前的作品都会训练一个生成单个向量来调节INR权重的超网络,其中单个向量成为限制输出INR重建精度的信息瓶颈。最近的工作表明,可以通过基于梯度的元学习来精确地推断出INR中的整个权重。由基于梯度的元学习的广义公式的动机,我们提出了一种使用变压器作为INRS的超网络的公式,它可以直接使用专门为设置映射的变压器构建整个INR权重。我们证明了我们在不同任务和域中构建INR的方法的有效性,包括2D图像回归和3D对象的查看合成。我们的工作吸引了变压器超网与基于梯度的元学习算法之间的连接,我们提供了进一步的分析,以理解生成的INRS。带代码的项目页面位于\ url {https://yinboc.github.io/trans-inr/}。
translated by 谷歌翻译
在语义细分中,将高级上下文信息与低级详细信息集成至关重要。为此,大多数现有的分割模型都采用双线性启动采样和卷积来具有不同尺度的地图,然后以相同的分辨率对齐。但是,双线性启动采样模糊了这些特征地图和卷积中所学到的精确信息,这会产生额外的计算成本。为了解决这些问题,我们提出了隐式特征对齐函数(IFA)。我们的方法的灵感来自隐式神经表示的快速扩展的主题,在该主题中,基于坐标的神经网络用于指定信号字段。在IFA中,特征向量被视为表示2D信息字段。给定查询坐标,附近的具有相对坐标的特征向量是从多级特征图中获取的,然后馈入MLP以生成相应的输出。因此,IFA隐含地将特征图在不同级别对齐,并能够在任意分辨率中产生分割图。我们证明了IFA在多个数据集上的功效,包括CityScapes,Pascal环境和ADE20K。我们的方法可以与各种体系结构的改进结合使用,并在共同基准上实现最新的计算准确性权衡。代码将在https://github.com/hzhupku/ifa上提供。
translated by 谷歌翻译
视频通常将流和连续的视觉数据记录为离散的连续帧。由于存储成本对于高保真度的视频来说是昂贵的,因此大多数存储以相对较低的分辨率和帧速率存储。最新的时空视频超分辨率(STVSR)的工作是开发出来的,以将时间插值和空间超分辨率纳入统一框架。但是,其中大多数仅支持固定的上采样量表,这限制了其灵活性和应用。在这项工作中,我们没有遵循离散表示,我们提出了视频隐式神经表示(videoinr),并显示了其对STVSR的应用。学到的隐式神经表示可以解码为任意空间分辨率和帧速率的视频。我们表明,Videoinr在常见的上采样量表上使用最先进的STVSR方法实现了竞争性能,并且在连续和训练的分布量表上显着优于先前的作品。我们的项目页面位于http://zeyuan-chen.com/videoinr/。
translated by 谷歌翻译
How to represent an image? While the visual world is presented in a continuous manner, machines store and see the images in a discrete way with 2D arrays of pixels. In this paper, we seek to learn a continuous representation for images. Inspired by the recent progress in 3D reconstruction with implicit neural representation, we propose Local Implicit Image Function (LIIF), which takes an image coordinate and the 2D deep features around the coordinate as inputs, predicts the RGB value at a given coordinate as an output. Since the coordinates are continuous, LIIF can be presented in arbitrary resolution. To generate the continuous representation for images, we train an encoder with LIIF representation via a self-supervised task with superresolution. The learned continuous representation can be presented in arbitrary resolution even extrapolate to ×30 higher resolution, where the training tasks are not provided. We further show that LIIF representation builds a bridge between discrete and continuous representation in 2D, it naturally supports the learning tasks with size-varied image ground-truths and significantly outperforms the method with resizing the ground-truths. Our project page with code is at https://yinboc.github.io/liif/.
translated by 谷歌翻译
Multivariate time series forecasting with hierarchical structure is pervasive in real-world applications, demanding not only predicting each level of the hierarchy, but also reconciling all forecasts to ensure coherency, i.e., the forecasts should satisfy the hierarchical aggregation constraints. Moreover, the disparities of statistical characteristics between levels can be huge, worsened by non-Gaussian distributions and non-linear correlations. To this extent, we propose a novel end-to-end hierarchical time series forecasting model, based on conditioned normalizing flow-based autoregressive transformer reconciliation, to represent complex data distribution while simultaneously reconciling the forecasts to ensure coherency. Unlike other state-of-the-art methods, we achieve the forecasting and reconciliation simultaneously without requiring any explicit post-processing step. In addition, by harnessing the power of deep model, we do not rely on any assumption such as unbiased estimates or Gaussian distribution. Our evaluation experiments are conducted on four real-world hierarchical datasets from different industrial domains (three public ones and a dataset from the application servers of Alipay's data center) and the preliminary results demonstrate efficacy of our proposed method.
translated by 谷歌翻译
Image super-resolution (SR) is a technique to recover lost high-frequency information in low-resolution (LR) images. Spatial-domain information has been widely exploited to implement image SR, so a new trend is to involve frequency-domain information in SR tasks. Besides, image SR is typically application-oriented and various computer vision tasks call for image arbitrary magnification. Therefore, in this paper, we study image features in the frequency domain to design a novel scale-arbitrary image SR network. First, we statistically analyze LR-HR image pairs of several datasets under different scale factors and find that the high-frequency spectra of different images under different scale factors suffer from different degrees of degradation, but the valid low-frequency spectra tend to be retained within a certain distribution range. Then, based on this finding, we devise an adaptive scale-aware feature division mechanism using deep reinforcement learning, which can accurately and adaptively divide the frequency spectrum into the low-frequency part to be retained and the high-frequency one to be recovered. Finally, we design a scale-aware feature recovery module to capture and fuse multi-level features for reconstructing the high-frequency spectrum at arbitrary scale factors. Extensive experiments on public datasets show the superiority of our method compared with state-of-the-art methods.
translated by 谷歌翻译
Deep learning models can achieve high accuracy when trained on large amounts of labeled data. However, real-world scenarios often involve several challenges: Training data may become available in installments, may originate from multiple different domains, and may not contain labels for training. Certain settings, for instance medical applications, often involve further restrictions that prohibit retention of previously seen data due to privacy regulations. In this work, to address such challenges, we study unsupervised segmentation in continual learning scenarios that involve domain shift. To that end, we introduce GarDA (Generative Appearance Replay for continual Domain Adaptation), a generative-replay based approach that can adapt a segmentation model sequentially to new domains with unlabeled data. In contrast to single-step unsupervised domain adaptation (UDA), continual adaptation to a sequence of domains enables leveraging and consolidation of information from multiple domains. Unlike previous approaches in incremental UDA, our method does not require access to previously seen data, making it applicable in many practical scenarios. We evaluate GarDA on two datasets with different organs and modalities, where it substantially outperforms existing techniques.
translated by 谷歌翻译
The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.
translated by 谷歌翻译
As one of the prevalent methods to achieve automation systems, Imitation Learning (IL) presents a promising performance in a wide range of domains. However, despite the considerable improvement in policy performance, the corresponding research on the explainability of IL models is still limited. Inspired by the recent approaches in explainable artificial intelligence methods, we proposed a model-agnostic explaining framework for IL models called R2RISE. R2RISE aims to explain the overall policy performance with respect to the frames in demonstrations. It iteratively retrains the black-box IL model from the randomized masked demonstrations and uses the conventional evaluation outcome environment returns as the coefficient to build an importance map. We also conducted experiments to investigate three major questions concerning frames' importance equality, the effectiveness of the importance map, and connections between importance maps from different IL models. The result shows that R2RISE successfully distinguishes important frames from the demonstrations.
translated by 谷歌翻译
Compressed videos often exhibit visually annoying artifacts, known as Perceivable Encoding Artifacts (PEAs), which dramatically degrade video visual quality. Subjective and objective measures capable of identifying and quantifying various types of PEAs are critical in improving visual quality. In this paper, we investigate the influence of four spatial PEAs (i.e. blurring, blocking, bleeding, and ringing) and two temporal PEAs (i.e. flickering and floating) on video quality. For spatial artifacts, we propose a visual saliency model with a low computational cost and higher consistency with human visual perception. In terms of temporal artifacts, self-attention based TimeSFormer is improved to detect temporal artifacts. Based on the six types of PEAs, a quality metric called Saliency-Aware Spatio-Temporal Artifacts Measurement (SSTAM) is proposed. Experimental results demonstrate that the proposed method outperforms state-of-the-art metrics. We believe that SSTAM will be beneficial for optimizing video coding techniques.
translated by 谷歌翻译